On Differentially Private Frequent Itemsets Mining
نویسندگان
چکیده
Frequent itemsets mining finds sets of items that frequently appear together in a database. However, publishing this information might have privacy implications. Accordingly, in this paper we are considering the problem of guaranteeing differential privacy for frequent itemsets mining. We measure the utility of a frequent itemsets mining algorithm by its likelihood to produce a complete and sound result where “completeness” requires the algorithm to include the “sufficiently” frequent itemsets and “soundness” needs the algorithm to exclude the“sufficiently” infrequent ones. We prove that it is hard to simultaneously satisfy both differential privacy and a non-trivial utility requirement. However, we find that we can produce reasonably accurate results while still guaranteeing differential privacy on benchmark datasets [19] by truncating transactions to limit their cardinality.
منابع مشابه
Privacy Preserving Private Frequent Itemset Mining via Smart Splitting
Recently there has been a growing interest in designing differentially private data mining algorithms. A variety of algorithms have been proposed for mining frequent itemsets. Frequent itemset mining (FIM) is one of the most fundamental problems in data mining. It has practical importance in a wide range of application areas such as decision support, web usage mining, bioinformatics, etc. In th...
متن کاملOn differentially private frequent itemset mining
We consider differentially private frequent itemset mining. We begin by exploring the theoretical difficulty of simultaneously providing good utility and good privacy in this task. While our analysis proves that in general this is very difficult, it leaves a glimmer of hope in that our proof of difficulty relies on the existence of long transactions (that is, transactions containing many items)...
متن کاملCandidate Pruning-Based Differentially Private Frequent Itemsets Mining
Frequent Itemsets Mining(FIM) is a typical data mining task and has gained much attention. Due to the consideration of individual privacy, various studies have been focusing on privacy-preserving FIM problems. Differential privacy has emerged as a promising scheme for protecting individual privacy in data mining against adversaries with arbitrary background knowledge. In this paper, we present ...
متن کاملA Survey on Mining High Utility Itemsets from Transactional Databases
Mining high utility itemsets from a transactional database refers to the discovery of itemsets with high utility like profits. Frequent itemset mining (FIM) is one of the most fundamental problems in data mining. In this work, we propose a novel strategy based on the analysis of item co-occurrences to reduce the number of join operations that need to be performed (FHM: Faster High-Utility Miner...
متن کاملMINING FUZZY TEMPORAL ITEMSETS WITHIN VARIOUS TIME INTERVALS IN QUANTITATIVE DATASETS
This research aims at proposing a new method for discovering frequent temporal itemsets in continuous subsets of a dataset with quantitative transactions. It is important to note that although these temporal itemsets may have relatively high textit{support} or occurrence within particular time intervals, they do not necessarily get similar textit{support} across the whole dataset, which makes i...
متن کامل